Key concepts and definitions:
Theory, hypotheses, operationalization, and measurement

PSCI 2270 - Lecture 2

Georgiy Syunyaev

Department of Political Science, Vanderbilt University

September 3, 2024

Plan for this week


  1. Causal Theories

  2. From Theory to Hypothesis

  3. Operationalization of Theory

  4. Learning about Population from Sample

Plan for this week

  1. Causal Theories

Three Types of Empirical Questions




  • Descriptive: Summarize data, investigate facts, discover hidden patterns

  • Predictive: Forecast events based on co-occurance with other events/factors

  • Causal: Answer what-if’s

Three Types of Empirical Questions

  • Causal: Answer what-if’s

What is Correlation? Causation? Confounder?


  • Correlation: is any statistical association, though it commonly refers to the degree to which a pair of variables are linearly related.

  • Causation: indicates that one event is the result of the occurrence of the other event; i.e. there is a causal relationship between the two events. The events are referred to as cause and effect.

  • Confounder: (also confounding variable, omitted variable, or lurking variable) is a variable that influences both the dependent variable and independent variable, causing a spurious association.

Why is this funny?


Seeing linear correlation



Seeing NO linear correlation



What could correlation mean?

  • Suppose there are two factors that we know are positively correlated with each other (e.g. when \(X\) is higher, \(Y\) tends to be higher too).

  • \(X\) usually refers to independent/explanatory variable; \(Y\) – to dependent/outcome variable

Direct causation

  • Example: Coffee reduces chances of cardiovascular disease (and much more summary on popular health supplements here).

Confounding

  • Example: As ice cream sales increase, the rate of drowning deaths increases sharply. Therefore, ice cream consumption causes deaths.

Reverse causation

  • Example: The faster windmills are observed to rotate, the more wind we observe. Therefore wind is caused by the rotation of windmills.

Bidirectional causation

  • Example: Income inequality can lead to democratization, but also democratization can lead to change in income inequality.
  • IMPORTANT! Most of the bidirectional causal relationships can be represented as recursive relationship, i.e. \(X_{t} \rightarrow Y_{t} \rightarrow X_{t+1}\), where \(t\) represents some time period

Everything Everywhere All at Once

  • TO MAKE THINGS WORSE! Multiple types of relationships can be present at the same time making it even harder to see whether causal relationship exists and what is its magnitude.

Spurious correlation (by chance)

  • Example: Cristiano Ronaldo’s domestic league goal tally prdicts the price of gold between 2004 and 2014 (here and random here).
  • HINT! Try to think about the mechanism.

Is it causal?


Claim: Internet searches for the word “flu” increase the incidence of flu in the city from which the search arose.

Evidence: The more people in a city who do a Google search for the word “flu”, the more cases of flu there tend to be in that city.

Claim: Smoking causes lung cancer.

Evidence: People who smoke are more likely to contract lung cancer than people who don’t smoke.

Claim: Cell phone access increases violent protest.

Evidence: When a region in Africa gets cell phone coverage the frequency of violent political protests in the next year goes up.

Is it causal?


Claim: Experience of civil war causes people to develop more violent personalities.

Evidence: The more years of civil war a country has experienced since 1945, the more yellow and red cards its nationals get in club and international soccer matches.

Claim: Giving a student high grades causes them to perform better on standardized tests.

Evidence: Teenagers with higher high school GPAs get better scores on their SATs.

Claim: Super Bowl appearances are bad for health in the team’s home town.

Evidence: If you live in a town whose team makes it to the Super Bowl you are more likely to die from the flu in that year.

Is it causal?

Caution: Big Data and Prediction



  • Google Flu Trends: Predicting epidemics or public health issues based on the Google searches related to the flu-related information.
  • Moneyball: Based on a book about how the Oakland Athletics baseball team used analytics and evidence-based data to assemble a competitive team. It abandoned old predictors of success, such as runs batted in, for overlooked ones, like on-base percentage.
  • Simulated humans: Simulate individual survey responses using ChatGPT to approximate real life responses to anything (!) at no cost.

Plan for this week

  1. From Theory to Hypothesis

Directed Acyclical Graphs (DAGs)

  • A DAG displays assumptions about the relationship between variables (nodes).

    • The assumptions we make take the form of lines (edges) going from one node to another.
    • Edges are directed, which means to say that they have a single (!) arrowhead indicating their effect.
  • DAGs explain causality in terms of counterfactuals. A causal effect is defined as a comparison between two states of the world
  • In DAG notation, causality runs in one direction. Specifically, it runs forward in time. There are no cycles in a DAG.

Effects of Partisan Media

Page and Jones (1979)


Markus and Converse (1979)


Do NOT do THIS! 😵

Plan for this week

  1. Operationalization of Theory

Concepts vs. Indicators

  • Theories are made up of concepts (nodes):

    • Inequality, civil discourse, media consumption, political knowledge, outgroup contact, views on immigration
    • When creating a diagram of theory (DAGs) we took those for granted
  • Concepts are latent:

    • We almost never observe “concepts”
    • Instead we rely on “indicators” or “proxies”
  • Indicators are concrete:

    • Concrete measure of a latent concept
    • Sometimes they’re “good,” sometimes they’re “rough”

Sometimes there is slippage

  • Important to consider how do we construct indicators

    • Some more straightforward: what is your age? how often do you watch TV?
    • Others more complicated: political self-efficacy? racial discrimination?
    • Have to create an operational definition of a concept to make it into a variable in our dataset
  • Sometimes there is slippage between latent concept and proxy, e.g.

    • Responses to a specific policy question about affirmative action as a proxy for “racial resentment”
    • Outcomes measured via self-reports may be clouded by social desirability bias (e.g., self-reported voter turnout)
  • Important to make measurement as unobtrusive as possible

Example



  • Concept presidential approval.
  • Conceptual definition: Extent to which US adults support the actions and policies of the current US president.
  • Operational definition (Indicator):

    • “On a scale from 1 to 5, where 1 is least supportive and 5 is more supportive, how much would you say you support the job that Donald Trump is doing as president?”

Measurement Error

  • Reliability:

    • Receive same answer over repeated measurements
    • Individual measurement = exact value + chance error
    • Chance errors tend to cancel out when we average across large sample
  • Validity:

    • Avoid systematic errors, bias, in the same direction
    • Individual measurement = exact value + chance error + bias

Biased Poll

  • What would you do better? Can we still salvage this poll results?

Preventing Bias:


  • 2002 WHO survey of people in China and Mexico about political self-efficacy

Question: “How much say do you have in getting the government to address issues that interest you?”

  1. No say at all
  2. Little say
  3. Some say
  4. A lot of say
  5. Unlimited say
  • Problem?

    • Different people interpret questions differently
    • Cross-cultural differences, vague questions

Achoring Vignette


  • Solution: Try to anchor responses with vignettes with different levels of “objective” efficacy and ask the

    • Alison lacks clean drinking water. She and her neighbors are supporting an opposition candidate in the forthcoming elections that has promised to address the issue. It appears that so many people in her area feel the same way that the opposition candidate will defeat the incumbent representative.
    • Jane lacks clean drinking water because the government is pursuing an industrial development plan. In the campaign for an upcoming election, an opposition party has promised to address the issue, but she feels it would be futile to vote for the opposition since the government is certain to win.
    • Moses lacks clean drinking water. He would like to change this, but he can’t vote, and feels that no one in the government cares about this issue. So he suffers in silence, hoping something will be done in the future.
  • “Objective” ranking: Alison \(>\) Jane \(>\) Moses

  • Place respondent on the scale

Preventing Bias


  • Survey across Arab Countries about perceptions of gender equality:

Question: “Do you agree or disagree with the following statement: Men are better leaders than women?”

  1. Strongly Agree
  2. Agree
  3. Disagree
  4. Strongly Disagree
  • Problem?

    • This captures only part of the concept
    • Cross-cultural differences

Aggregation

  • Sample across Arab World asked about equality:

    • Men better leaders
    • Be president/prime minister
    • Work outside the home
    • University for boys not girls
    • Equal job opportunities
    • Equal wages
    • Travel abroad alone
  • Combine this information:

    • Create an additive index
    • What do the numbers mean?

Unit of Analysis


  • Every concept requires a unit of analysis

    • Individual level: Characteristics of individuals, documents, news reports, etc.
    • Aggregate level: Groups of people, districts, countries, sets of documents, time periods, etc.
  • Many concepts can be measured at multiple levels, e.g. if we want to measure wealth:

    • At the individual level: Income? From wages? From capital gains? Assets? Consumer products? Calories consumed?

    • At the country level: GDP? GDP/capita? Energy consumption? Infant mortality rate?

Next Time

  • Learning about Population from Sample

  • Descriptive Statistics

  • Types of Data Collection

References

Markus, Gregory B., and Philip E. Converse. 1979. “A Dynamic Simultaneous Equation Model of Electoral Choice.” The American Political Science Review 73 (4): 1055–70. https://doi.org/10.2307/1953989.
Page, Benjamin I., and Calvin C. Jones. 1979. “Reciprocal Effects of Policy Preferences, Party Loyalties and the Vote.” The American Political Science Review 73 (4): 1071–89. https://doi.org/10.2307/1953990.